Machine translation and corpus-based methods
نویسنده
چکیده
For this term paper in the course Natural Language Processing, NLP1 within GSLT, I have chosen to explore some issues of NLP related to machine translation (MT). The development in MT systems shows that the aim is clearly not to replace human translators but instead to focus on domain-restricted mass-translations and systems for non-translators, new areas enabled by modern information technics. The availability of large corpora and developed automatic corpus-based methods makes it possible to reduce one bottleneck of MT, the lack of knowledge.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملCorrection of Errors in a Modality Corpus Used for Machine Translation Using Machine-learning
We performed corpus correction on an annotated corpus for machine translation using machine-learning methods such as the maximum-entropy method. We thus constructed a high-quality annotated corpus based on corpus correction. We compared several di erent methods of corpus correction in our experiments and developed a suitable method for correction. Recently, corpus-based machine translation has ...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملCorrection of Errors in a Modality Corpus Used for Machine Translation by Using Machine-learning Method
We performed corpus correction on a modality corpus for machine translation by using such machine-learning methods as the maximum-entropy method. We thus constructed a high-quality modality corpus based on corpus correction. We compared several kinds of methods for corpus correction in our experiments and developed a good method for corpus correction.
متن کاملA Survey on Statistical-based Parallel Corpus Alignment
The text alignment is an important process of different Machine Translation systems. This task consists in identifying correspondences between words, sentences or paragraphs of a source text and their translation (parallel corpus). There are two main approaches to perform parallel corpus alignment: the statistical-based methods and lexical-based methods. In this paper, the main statistical-base...
متن کاملSub-Sentential Alignment Method by Analogy
This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...
متن کامل